This document provides an overview of basic concepts in network analysis and graph theory, with practical examples implemented in the R programming language.

1. Introduction to Network Analysis

First, load the required packages:

library(igraph) # Core package for network analysis
## 
## Attaching package: 'igraph'
## The following objects are masked from 'package:stats':
## 
##     decompose, spectrum
## The following object is masked from 'package:base':
## 
##     union
library(threejs) # 3D network visualization

The igraph package is essential for network analysis in R. It provides efficient, high-level functions for network manipulation, analysis, and visualization. On the other hand, the threejs package is used to create interactive 3D visualizations of our networks.

Loading and Inspecting Data

Here, the dataset ‘friends.csv’ is loaded and converted into a graph object, g, using the graph.edgelist() function. This function reads in a list of edges from a matrix, creating an undirected graph (since directed = FALSE).

We can inspect the following key properties of our network using the following commands:

  • V(g): Displays the vertices (or nodes) of the graph.
  • E(g): Displays the edges of the graph, where each edge represents a connection or relationship between two vertices.
  • gsize(g): Returns the number of edges in the graph, which can give an idea of the interconnectedness of the network.
  • gorder(g): Returns the number of nodes in the graph, indicating the size of the network.
# Loading dataset
friends <- read.csv('friends.csv')
head(friends)
##    name1  name2
## 1 Jessie Sidney
## 2 Jessie  Britt
## 3 Sidney  Britt
## 4 Sidney Donnie
## 5   Karl  Berry
## 6 Sidney   Rene
# Load edgelist df as graph
friends.mat <- as.matrix(friends)
g <- graph.edgelist(friends.mat,
                    directed = FALSE)

# Inspecting basic properties of graph data
V(g); E(g) # vertices / edges
## + 16/16 vertices, named, from 3b9f61d:
##  [1] Jessie  Sidney  Britt   Donnie  Karl    Berry   Rene    Shayne  Elisha 
## [10] Whitney Odell   Lacy    Eugene  Jude    Rickie  Tommy
## + 27/27 edges from 3b9f61d (vertex names):
##  [1] Jessie --Sidney  Jessie --Britt   Sidney --Britt   Sidney --Donnie 
##  [5] Karl   --Berry   Sidney --Rene    Britt  --Rene    Sidney --Shayne 
##  [9] Sidney --Elisha  Sidney --Whitney Jessie --Whitney Donnie --Odell  
## [13] Sidney --Odell   Rene   --Whitney Donnie --Shayne  Jessie --Lacy   
## [17] Rene   --Lacy    Elisha --Eugene  Eugene --Jude    Berry  --Odell  
## [21] Odell  --Rickie  Karl   --Odell   Britt  --Lacy    Elisha --Jude   
## [25] Whitney--Lacy    Britt  --Whitney Karl   --Tommy
gsize(g) # no. of edges
## [1] 27
gorder(g) # no. of nodes
## [1] 16
# Plot network
plot(g)

Adding Vertex and Edge Attributes

In addition to the structure of the network (i.e., nodes and their connections), often, we need to incorporate additional data associated with the nodes (vertices) or the connections (edges). In igraph, these are handled through vertex and edge attributes:

genders <- c("M", "F", "F", "M", "M", "M", "F", "M",
             "M", "F", "M", "F", "M", "F", "M", "M")
ages <- c(18, 19, 21, 20, 22, 18, 23, 21,
          22, 20, 20, 22, 21, 18, 19, 20)

g <- set_vertex_attr(g, "gender", value = genders)
g <- set_vertex_attr(g, "age", value = ages)

vertex_attr(g) # get vertex attr
## $name
##  [1] "Jessie"  "Sidney"  "Britt"   "Donnie"  "Karl"    "Berry"   "Rene"   
##  [8] "Shayne"  "Elisha"  "Whitney" "Odell"   "Lacy"    "Eugene"  "Jude"   
## [15] "Rickie"  "Tommy"  
## 
## $gender
##  [1] "M" "F" "F" "M" "M" "M" "F" "M" "M" "F" "M" "F" "M" "F" "M" "M"
## 
## $age
##  [1] 18 19 21 20 22 18 23 21 22 20 20 22 21 18 19 20

The code above creates new vertex attributes, namely gender and age, and assigns them to the graph g using the set_vertex_attr function. The vertex_attr(g) function can then be used to retrieve all vertex attributes.

Similarly, we can create and add edge attributes to our graph:

hours <- c(1, 2, 2, 1, 2, 5, 5, 1, 1,
           3, 2, 1, 1, 5, 1, 2, 4, 1,
           3, 1, 1, 1, 4, 1, 3, 3, 4)

g <- set_edge_attr(g, "hours", value = hours)
edge_attr(g)
## $hours
##  [1] 1 2 2 1 2 5 5 1 1 3 2 1 1 5 1 2 4 1 3 1 1 1 4 1 3 3 4
E(g)[[.inc('Britt')]] # Find all edges that include "Britt"
## + 5/27 edges from 3b9f61d (vertex names):
##      tail    head tid hid hours
## 2  Jessie   Britt   1   3     2
## 3  Sidney   Britt   2   3     2
## 7   Britt    Rene   3   7     5
## 23  Britt    Lacy   3  12     4
## 26  Britt Whitney   3  10     3
E(g)[[hours>=4]]  # Find all pairs that spend 4 or more hours together per week
## + 6/27 edges from 3b9f61d (vertex names):
##      tail    head tid hid hours
## 6  Sidney    Rene   2   7     5
## 7   Britt    Rene   3   7     5
## 14   Rene Whitney   7  10     5
## 17   Rene    Lacy   7  12     4
## 23  Britt    Lacy   3  12     4
## 27   Karl   Tommy   5  16     4

The hours edge attribute represents the number of hours each pair of friends spends together per week. We can use edge attributes to query our graph, such as finding all edges (friend pairs) that include “Britt” or pairs that spend 4 or more hours together per week.

Loading a Graph with Node and Edge Attributes from CSV Files

We can also load graph data directly from files that include node and edge attributes:

# Load dataframe with edge and node attributes
friends1_edges <- read.csv('friends1_edges.csv')
friends1_nodes <- read.csv('friends1_nodes.csv')

# Load graph from df
g1 <- graph_from_data_frame(d = friends1_edges,
          vertices = friends1_nodes,
          directed = FALSE)

E(g1)[[hours >= 5]]  # Subset edges greater than or equal to 5 hours
## + 4/25 edges from 5127ec2 (vertex names):
##         tail      head tid hid hours
## 5     Kelley Valentine   3   6     5
## 8     Ronald   Jasmine   4   8     5
## 12 Valentine     Perry   6  15     5
## 15   Jasmine      Juan   8   9     6
V(g1)$color <- ifelse(V(g1)$gender == "F", # Set vertex color by gender
                      "orange", "dodgerblue")

# Visualizing attributes
plot(g1, vertex.label.color = "black")

The graph_from_data_frame function is used to create a graph from two data frames: one that contains edge attributes (friends1_edges) and one that contains node attributes (friends1_nodes).

Next, we set the color of each node (vertex) based on the gender attribute. Finally, the network is visualized with plot(), which now includes the new color attribute.

The added attributes can provide richer insights and improve the clarity of visualizations. For instance, here, we can immediately see the gender distribution in the network. Attributes can be any information that adds value to our analysis—demographic data, interaction frequency, weights of connections, etc.

Exploring Network Layouts

The igraph package offers a range of layout algorithms to choose from when visualizing a network. These layouts can drastically affect the readability and interpretability of the network:

## Exploring igraph network layouts
par(mfrow = c(1,2))
plot(g1, vertex.label.color = "black", layout = layout_in_circle(g1))
plot(g1, vertex.label.color = "black", layout = layout_with_fr(g1))

par(mfrow = c(1,2))
m <- layout_as_tree(g1)
plot(g1, vertex.label.color = "black", layout = m)
m1 <- layout_nicely(g1)
plot(g1, vertex.label.color = "black", layout = m1)

The layout_in_circle function places all vertices on a circle, while layout_with_fr applies the Fruchterman-Reingold layout algorithm, which positions nodes with more connections closer to the center.

The layout_as_tree function, as the name suggests, presents the network in a tree-like structure, and layout_nicely applies an appropriate layout based on the characteristics of the network.

Choosing an appropriate layout is crucial for effectively conveying the structure and relationships within the network.

Visualizing Edge Weights and Modifying Networks

Edge weights, represented as the hours attribute, can be visualized in the network graph to demonstrate the strength of relationships:

# Create a vector of weights based on the number of hours each pair spend together
w1 <- E(g1)$hours

# Plot the network varying edges by weights
m1 <- layout_nicely(g1)
plot(g1, 
     vertex.label.color = "black", 
     edge.color = 'black',
     edge.width = w1,
     layout = m1, 
     main = "Weighted graph")

The plot function is used to display the network graph, with the edge.width parameter set to w1 (the hours edge attribute). This causes the width of the edges in the graph to correspond to the number of hours each pair of friends spends together each week.

Next, we can create a new network graph where only relationships that are 2 hours or longer are displayed:

# Create a new igraph object by deleting edges that are less than 2 hours long 
g2 <- delete_edges(g1, E(g1)[hours < 2])

# Plot the new graph 
w2 <- E(g2)$hours
m2 <- layout_nicely(g2)

plot(g2, 
     vertex.label.color = "black", 
     edge.color = 'black',
     edge.width = w2,
     layout = m2,
     main = "Weighted (hours > 2)")

The delete_edges function is used to remove all edges (friend pairs) that spend less than 2 hours together per week, creating a new graph g2. This modified graph is then displayed, once again with edge widths corresponding to the hours edge attribute.

The ability to add, remove, and modify nodes and edges enables the construction of subgraphs, the exploration of specific network features, and the simulation of dynamic network changes.

2. Identifying important vertices in a network

In this chapter you will learn about directed networks. You will also learn how to identify key relationships between vertices in a network as well as how to use these relationships to identify important or influential vertices. Throughout this chapter you will use a network of measles transmission. The data come from the German city of Hagelloch in 1861. Each directed edge of the network indicates a child becoming infected with measles after coming into contact with an infected child.

Creating a Directed Graph

Creating a directed graph is straightforward using igraph’s graph_from_data_frame() function. The directed parameter is set to TRUE to indicate that the edges have a direction associated with them.

# Load measles dataframe
measles <- read.csv('measles.csv')

# Initialize new directed graph
g <- graph_from_data_frame(measles, directed = TRUE)

is.directed(g); is.weighted(g) # Check directed / weighted graph
## [1] TRUE
## [1] FALSE
table(head_of(g, E(g))) # Check edge origins
## 
##   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  17  18  19  20  21 
##   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1 
##  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41 
##   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1 
##  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  61  62  63 
##   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1 
##  64  65  66  67  68  69  70  71  72  73  74  75  76  77  78  79  80  81  82  83 
##   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1 
##  84  85  86  87  88  89  90  91  92  93  94  95  96  97  98  99 100 101 102 103 
##   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1 
## 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 
##   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1 
## 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 
##   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1 
## 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 
##   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1 
## 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 
##   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1 
## 184 185 186 187 
##   1   1   1   1
# Exploratory plot of graph
plot(g, 
    vertex.label.color = "black", 
    edge.color = 'gray77',
    vertex.size = 0,
    edge.arrow.size = 0.1,
    layout = layout_nicely(g))

Inspecting Edges and their Directionality

Edges in a directed graph have a direction, going from one vertex to another. These directed connections can be inspected using the adjacency matrix or the incident() function:

# Is there an edge going from vertex 184 to vertex 178?
g['184', '178']
## [1] 1
# Is there an edge going from vertex 178 to vertex 184?
g['178', '184']
## [1] 0
incident(g, '184', mode = c("all")) # Show all edges going to or from vertex 184
## + 6/184 edges from 3127a58 (vertex names):
## [1] 184->45  184->182 184->181 184->178 184->183 184->177
incident(g, '184', mode = c("out")) # Show all edges going out from vertex 184
## + 6/184 edges from 3127a58 (vertex names):
## [1] 184->45  184->182 184->181 184->178 184->183 184->177

Neighboring Vertices and Intersections

The neighbors() function shows the adjacent vertices of a given node. The mode parameter specifies the direction of the edges to consider. The intersection() function can be used to identify vertices that share in- and out- connections:

##  Neighboring nodes
neighbors(g, '12', mode = c('all'))
## + 5/187 vertices, named, from 3127a58:
## [1] 45  13  72  89  109
# Identify other vertices that direct edges towards vertex 12
neighbors(g, '12', mode = c('in'))
## + 1/187 vertex, named, from 3127a58:
## [1] 45
# Identify any vertices that receive an edge from vertex 42 and direct an edge to vertex 124
n1 <- neighbors(g, '42', mode = c('out'))
n2 <- neighbors(g, '124', mode = c('in'))
intersection(n1, n2)
## + 1/187 vertex, named, from 3127a58:
## [1] 7

Calculating Distances

The igraph package provides functions for calculating the farthest vertices (farthest_vertices()), the longest traversal path (get_diameter()), and reachable vertices within a certain number of connections (ego()):

## Distances between vertices
farthest_vertices(g) # diameter / longest traversal length
## $vertices
## + 2/187 vertices, named, from 3127a58:
## [1] 184 162
## 
## $distance
## [1] 5
get_diameter(g) # path sequence between furthest vertices
## + 6/187 vertices, named, from 3127a58:
## [1] 184 178 42  7   123 162
# Identify vertices that are reachable within two connections from vertex 42
ego(g, 2, '42', mode = c('out'))
## [[1]]
## + 13/187 vertices, named, from 3127a58:
##  [1] 42  7   106 43  123 101 120 124 125 128 129 108 127
# Identify vertices that can reach vertex 42 within two connections
ego(g, 2, '42', mode = c('in'))
## [[1]]
## + 3/187 vertices, named, from 3127a58:
## [1] 42  178 184

Identifying Key Vertices

Key vertices (also known as “hubs”) are often of particular interest in network analysis. They can be identified based on their degree (the number of edges connected to a vertex):

## Identifying key vertices
g.outd <- degree(g, mode = c("out")) # Calculate the out-degree of each vertex

table(g.outd) # Summary of out-degree
## g.outd
##   0   1   2   3   4   6   7   8  30 
## 125  21  16  12   6   2   3   1   1
hist(g.outd, breaks = 30) # out-degree histogram

which.max(g.outd) # vertex with maximum out-degree
## 45 
##  1

In directed networks, it’s important to differentiate between in-degree (number of incoming edges) and out-degree (number of outgoing edges). Nodes with high out-degrees are often sources of information or influence within the network.

Betweenness Centrality

Betweenness centrality is a measure of a node’s centrality in a network. It equals the number of shortest paths from all vertices to all others that pass through that node. Betweenness centrality is a more sophisticated measure of centrality than simple degree, because it takes into account not just the quantity of connections a node has, but also the quality of those connections in terms of information flow across the entire network.

## Betweenness Centrality Measure
g.b <- betweenness(g, directed = TRUE)

# Show histogram of vertex betweenness
par(mfrow = c(1,2))
hist(g.b, breaks = 80)

# Create plot with vertex size determined by betweenness score
plot(g, 
     vertex.label = NA,
     edge.color = 'black',
     vertex.size = sqrt(g.b)+1, # normalization allows all nodes to be viewed, 
     edge.arrow.size = 0.05, # ... relative importance to be identifiable
     layout = layout_nicely(g),
     main = "Betweenness (node size)")

Visualization Based on Geodesic Distance

Geodesic distance in a graph is the shortest path between two vertices. It is often useful to visualize geodesic distances from a particular node of interest (in this case, vertex 184) to all other nodes. This allows for a clear view of the relative positioning and connectivity of nodes in the network.

## Visualizing important nodes and edges
# Make an ego graph (subset of network comprised of all nodes connected to node 184)
g184 <- make_ego_graph(g, diameter(g),
                       nodes = '184',
                       mode = c("all"))[[1]]

# Get a vector of geodesic distances of all vertices from vertex 184 
dists <- distances(g184, "184")

# Create a color palette of length equal to the maximal geodesic distance plus one.
colors <- c("black", "red", "orange",
                   "blue", "dodgerblue", "cyan")

# Set color attribute to vertices of network g184.
V(g184)$color <- colors[dists+1]

# Visualize the network based on geodesic distance from vertex 184 (patient zero).
plot(g184, 
    vertex.label = dists, 
    vertex.label.color = "white",
    vertex.label.cex = .6,
    edge.color = 'black',
    vertex.size = 7,
    edge.arrow.size = .05,
    main = "Geodesic Distances from Patient Zero"
)

This approach helps visualize not just the network structure, but also how influence, information, or a contagion might propagate through the network from this starting point (often referred to as “patient zero” in the context of disease transmission).

3. Characterizing Network Structures

This module will show how to characterize global network structures and sub-structures. It will also introduce generating random network graphs.

Forrest Gump Network

Now, let’s construct and analyze a network from the “Forrest Gump” dataset. To measure the relative importance of each node, we’ll calculate their eigenvector centrality. Nodes with high eigenvector centrality are often influential nodes within the network because they are not only well connected themselves but also connected to other well-connected nodes.

## Forrest Gump Network
gump <- read.csv('gump.csv')
head(gump)
##              V1        V2
## 1 ABBIE HOFFMAN     JENNY
## 2 ABBIE HOFFMAN POLICEMAN
## 3     ANCHORMAN   FORREST
## 4     ANCHORMAN    LT DAN
## 5     ANCHORMAN     MARGO
## 6     ANCHORMAN  MRS GUMP
# Make an undirected network
g <- graph_from_data_frame(gump, directed = FALSE)

# Identify key nodes using eigenvector centrality
g.ec <- eigen_centrality(g)
which.max(g.ec$vector) # identify node with highest eigencentrality score
## FORREST 
##      36
# Plot Forrest Gump Network
plot(g,
     vertex.label.color = "black", 
     vertex.label.cex = 0.6,
     vertex.size = 25*(g.ec$vector),
     edge.color = 'gray88',
     main = "Forrest Gump Network"
)

Next, we’ll explore the density of the graph, which is a measure of the actual number of edges versus the potential number of edges, the diameter of the graph (the longest path distance), and the average path length.

# Get density of a graph (proportion of current edges vs all potential edges)
gd <- edge_density(g)

# Get the diameter of the graph g (longest path distance)
diameter(g, directed = FALSE)
## [1] 4
# Get the average path length of the graph g
g.apl <- mean_distance(g, directed = FALSE)
g.apl
## [1] 1.994967

Random Graphs

Random graphs are useful for comparing our actual network to what we would expect by chance. We will create one random graph, g.random, which has the same number of nodes and edges as our original graph.

## Random Graphs
# Create one random graph with the same number of nodes and edges as g
g.random <- erdos.renyi.game(n = gorder(g), p.or.m = gd, type = "gnp")
g.random
## IGRAPH 8f1719f U--- 94 279 -- Erdos-Renyi (gnp) graph
## + attr: name (g/c), type (g/c), loops (g/l), p (g/n)
## + edges from 8f1719f:
##  [1]  6-- 7  7-- 9  3--11  9--12  1--13  8--16  1--17  5--17  8--17 12--18
## [11]  8--19 12--20  5--21 10--21  1--23  7--23  3--24  5--25  9--25 17--25
## [21] 23--25 24--25  2--26  3--26  4--26 14--26 23--26  1--27 23--27 24--27
## [31] 17--28 15--29  4--30 22--30 25--30 28--30 29--30  2--31  9--31 18--31
## [41]  3--32 12--32 11--34 14--34  9--35 14--35 33--35 34--35 15--36  8--37
## [51] 15--37 24--37  7--38 22--38 32--38  9--39 22--39 31--39 34--40 11--41
## [61] 12--41 14--41 18--41  7--42 22--42 29--42 40--42 11--43 25--43 28--43
## [71] 30--43  1--44 38--44 42--44 32--45 36--45 41--45 29--46 31--46 32--46
## + ... omitted several edges
plot(g.random)

# Get density of new random graph `g.random`
edge_density(g.random)
## [1] 0.06382979
# Get the average path length of the random graph g.random
mean_distance(g.random, directed = FALSE)
## [1] 2.713795

Randomization Test

To statistically evaluate the structure of our network, we can perform a randomization test. This involves generating a large number of random networks and comparing the average path length of these networks to our original network.

## Randomization Test
# Generate 1000 random graphs
gl <- vector('list', 1000)

for(i in 1:1000){
  gl[[i]] <- erdos.renyi.game(n = gorder(g), p.or.m = gd, type = "gnp")
}

# Calculate average path length of 1000 random graphs
gl.apls <- unlist(lapply(gl, mean_distance, directed = FALSE))

# Plot the distribution of average path lengths
hist(gl.apls, xlim = range(c(1.5, 6)))
abline(v = g.apl, col = "red", lty = 3, lwd = 2)

# Calculate the proportion of graphs with an average path length lower than our observed
mean(gl.apls < g.apl)
## [1] 0

The red vertical line in the histogram represents the average path length of our original network. The proportion of random graphs with an average path length less than ours can be interpreted as a p-value. If this value is small (e.g., less than 0.05), we would conclude that our network’s structure is significantly different from a random network.

The Forrest Gump network is far more interconnected than we would expect by chance as zero random networks have an average path length smaller than the Forrest Gump network’s average path length.

Network Substructures (Triangles, Transitivity and Cliques)

Triangles in a network are substructures consisting of three nodes that are all connected to each other. Transitivity is a measure of the overall tendency to form triangles within a network. Cliques are subnetworks where every node is connected to every other node.

## Network Substructures
matrix(triangles(g), nrow = 3)[,1:5] # matrix of all possible triangles
##      [,1] [,2] [,3] [,4] [,5]
## [1,]   36   36   36   36   36
## [2,]    1    1    1    1    2
## [3,]   83   38   39   66   68
count_triangles(g, vids='BUBBA') # Count the number of triangles that vertex "BUBBA" is in.
## [1] 37
g.tr <- transitivity(g) # Calculate global transitivity of the network.
g.tr
## [1] 0.1918082
transitivity(g, vids='BUBBA', type = "local") # Calculate the local transitivity for vertex BUBBA.
## [1] 0.6727273

Transitivity Randomizations

To test whether the transitivity of our network is greater than expected by chance, we can compare the observed transitivity to that of a set of random networks.

## Transitivity Randomizations
gl.tr <- lapply(gl, transitivity) # Calculate average transitivity of 1000 random graphs
gl.trs <- unlist(gl.tr)

summary(gl.trs) # Get summary statistics of transitivity scores
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.02698 0.05347 0.06117 0.06124 0.06866 0.09677
mean(gl.trs > g.tr) # Calculate the proportion of graphs with a transitivity score higher than our observed
## [1] 0

Cliques

Cliques are subgroups where every node is connected to every other node. Let’s identify the largest cliques in our network.

## Cliques
largest_cliques(g) # Identify the largest cliques in the network
## [[1]]
## + 9/94 vertices, named, from f5bce86:
## [1] EMCEE   FORREST MEN     MAN #5  MAN #3  MAN #2  MAN #1  MAN #   JENNY  
## 
## [[2]]
## + 9/94 vertices, named, from f5bce86:
## [1] FORREST   LT DAN    STRONGARM SONG      SOLDIER   SGT SIMS  MAN      
## [8] DALLAS    BUBBA
clq <- max_cliques(g) # Determine all maximal cliques in the network

table(unlist(lapply(clq, length))) # Calculate the size of each maximal clique.
## 
##  2  3  4  5  6  7  9 
## 12 24  7  2  4  2  2
lc <- largest_cliques(g) # Assign largest cliques output to object 'lc'
gs1 <- as.undirected(subgraph(g, lc[[1]])) # Create new undirected subgraph containing only the vertices of largest clique.
gs2 <- as.undirected(subgraph(g, lc[[2]]))

# Plot the two largest cliques side-by-side
par(mfrow=c(1,2))
plot(gs1,
    vertex.label.color = "black", 
    vertex.label.cex = 0.9,
    vertex.size = 0,
    edge.color = 'gray28',
    main = "Largest Clique 1",
    layout = layout.circle(gs1)
)

plot(gs2,
    vertex.label.color = "black", 
    vertex.label.cex = 0.9,
    vertex.size = 0,
    edge.color = 'gray28',
    main = "Largest Clique 2",
    layout = layout.circle(gs2)
)

4. Identifying Special Relationships

This chapter will further explore the partitioning of networks into sub-networks and determining which vertices are more highly related to one another than others. You will also develop visualization methods by creating three-dimensional visualizations.

Close Relationships

Assortativity measures whether nodes tend to connect to similar or dissimilar nodes. In this case, we will examine assortativity in terms of gender and degree.

## Close Relationships
plot(g1) # Plot the network

values <- as.numeric(factor(V(g1)$gender)) # Convert the gender attribute into a numeric value

assortativity(g1, values) # Calculate the assortativity of the network based on gender
## [1] 0.1319444
assortativity.degree(g1, directed = FALSE) # Calculate the assortativity degree of the network
## [1] 0.4615385
observed.assortativity <- assortativity(g1, values) # Calculate the observed assortativity

results <- vector('list', 1000) # Calculate the assortativity of the network randomizing the gender attribute 1000 times
for(i in 1:1000){
  results[[i]] <- assortativity(g1

, sample(values))
}

# Plot the distribution of assortativity values and add a red vertical line at the original observed value
hist(unlist(results))
abline(v = observed.assortativity, col = "red", lty = 3, lwd=2)

In the context of directed networks, reciprocity is a measure of the likelihood of vertices to be mutually linked. It’s a measure of ‘mutuality’ of edges in the graph.

# Make a plot of the chimp grooming network
plot(g,
     edge.color = "black",
     edge.arrow.size = 0.3,
     edge.arrow.width = 0.5)

# Calculate the reciprocity of the graph
reciprocity(g)
## [1] 1

This will output the proportion of mutual connections, or reciprocal edges, in relation to the total number of edges, in the directed network graph ‘g’.

Community Detection

Community detection is an important task in network analysis which deals with the identification of densely connected groups of nodes. Two algorithms are commonly used for community detection: the fast-greedy algorithm and the edge-betweenness algorithm.

## Community Detection
kc = fastgreedy.community(g) # Perform fast-greedy community detection on network graph
sizes(kc) # Determine sizes of each community
## Community sizes
##  1  2  3  4  5  6  7 
## 21 21 32  7  5  5  3
membership(kc) # Determine which individuals belong to which community
##            ABBIE HOFFMAN                ANCHORMAN                ANNOUNCER 
##                        6                        1                        1 
##              ANOTHER DAY          ASSISTANT COACH                     BERT 
##                        1                        3                        5 
##                    BILLY              BLACK WOMAN                 BOB HOPE 
##                        2                        2                        1 
##                      BOY                   BOY #1                   BOY #2 
##                        2                        2                        2 
##                   BOY #3                    BUBBA BUS STOP - PRESENT - DAY 
##                        2                        1                        3 
##               CAB DRIVER                    CARLA             CHET HUNTLEY 
##                        2                        1                        3 
##                   DALLAS                     DEAN              DICK CAVETT 
##                        1                        3                        3 
##               DICK CLARK                   DOCTOR           DRILL SERGEANT 
##                        1                        3                        1 
##                    ELVIS                    EMCEE                    ERNIE 
##                        3                        4                        5 
##           FOOTBALL COACH             AGING HIPPIE            BLACK PANTHER 
##                        3                        3                        2 
##               BUS DRIVER                       DJ                   DRIVER 
##                        3                        2                        2 
##                     EARL            ELDERLY WOMAN                  FORREST 
##                        3                        1                        3 
##                     GIRL                   HILARY                   ISABEL 
##                        2                        6                        6 
##               FORREST JR                    JENNY                   LENORE 
##                        5                        2                        1 
##                   LOUISE                   LT DAN               LITTLE BOY 
##                        7                        1                        5 
##               MALE NURSE                    MAN #                   MAN #1 
##                        1                        4                        4 
##                   MAN #2                   MAN #3                    MASAI 
##                        4                        4                        2 
##                   MAN #5                 MINISTER               NEWSCASTER 
##                        4                        7                        3 
##         GOVERNOR WALLACE                  NEWSMAN                    NIGHT 
##                        3                        3                        1 
##             OLDER BOY #1                  OFFICER                 MRS GUMP 
##                        2                        3                        3 
##                    RUBEN                      MAN                 SGT SIMS 
##                        2                        1                        1 
##                  SOLDIER                     SONG                POLICEMAN 
##                        1                        1                        6 
##                    MARGO        PRESIDENT JOHNSON              WHITE WOMAN 
##                        1                        1                        2 
##              SLOW MOTION                STRONGARM              JOHN LENNON 
##                        2                        1                        3 
##                      MEN              JENNY'S DAD               LYNN MARIE 
##                        4                        2                        5 
##                 MRS BLUE                    NURSE             OLD SHRIMPER 
##                        3                        3                        3 
##             OLDER BOY #2        PRESIDENT KENNEDY                PRINCIPAL 
##                        2                        3                        3 
##                    SUSAN                      VET                   WESLEY 
##                        7                        6                        2 
##            WILD-EYED MAN             YOUNG HIPPIE                YOUNG MAN 
##                        3                        2                        3 
##               KATZENBACH                 REPORTER                OLDER BOY 
##                        3                        3                        3 
##                  PATRONS          PRESIDENT NIXON                 REVEREND 
##                        3                        3                        3 
##           SECURITY GUARD 
##                        3
plot(kc, g) # Plot the community structure of the network

gc = edge.betweenness.community(g) # Perform edge-betweenness community detection on network graph
sizes(gc)
## Community sizes
##  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 
##  5  3  1 10  5 14  3  5  3  4  3  1  6  1  4  1  1  2  1  1  3  1  2  2  2  1 
## 27 28 29 30 31 32 33 34 35 
##  1  1  1  1  1  1  1  1  1
# Plot community networks determined by fast-greedy and edge-betweenness methods side-by-side
par(mfrow = c(1, 2)) 
plot(kc, g)
plot(gc, g)

Interactive Network Visualizations with threejs

The threejs package in R is used to create interactive 3D scatterplots and networks. We will use it to create an interactive version of our network graph.

## Interactive Network Visualizations with threejs
library(threejs)

# Set a vertex attribute called 'color' to 'dodgerblue' 
g <- set_vertex_attr(g, "color", value = "dodgerblue")
graphjs(g, vertex.size = 1)
# Create numerical vector of vertex eigenvector centralities 
ec <- as.numeric(eigen_centrality(g)$vector)
v <- 5*sqrt(ec) # scaled vertex size to eigenvector centralities
graphjs(g, vertex.size = v)
# Memberships of the fast-greedy community detection
i <- membership(kc)
sizes(kc) # Check the number of different communities
## Community sizes
##  1  2  3  4  5  6  7 
## 21 21 32  7  5  5  3
# Add a color attribute to each vertex, setting the vertex color based on community membership
g <- set_vertex_attr(g, "color", value = c("yellow", "blue", "red")[i])
graphjs(g)

This will render the graph as an interactive threejs plot in your RStudio viewer or in your web browser, with the vertices colored according to their community membership.